Journal of the Audio Engineering Society

The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers.

The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work membership news, new products, and newsworthy developments in the field of audio.

Editor-in-Chief: Brian F.G. Katz

If you are experiencing any issues with the E-library or the Online Journals access, please fill in this form.

2026 May - Volume 74 Number 5

Papers


Perceptual Evaluation of Different Methods for Binaural Rendering of Recordings With Various Microphone Arrays

Authors: Lübeck, Tim; Scheer, Christian; Ackermann, David; Brinkmann, Fabian; Pörschmann, Christoph; Weinzierl, Stefan; Ahrens, Jens; Arend, Johannes M.


OPEN ACCESS

Binaural reproduction of microphone array recordings has become an important technology in the research and consumer sectors. Several commercially available spherical microphone arrays have been introduced over the years along with various methods for binaural rendering of array recordings. Most of these methods have been evaluated individually, typically using only one specific microphone array. However, a comprehensive and systematic perceptual evaluation combining different methods and various microphone arrays is lacking. This study presents the results of a listening experiment comparing the motion-tracked binaural method, various Ambisonic binaural decoders, and the parametric binaural rendering method COMPASS using loudspeaker orchestra recordings with six different microphone arrays from two rooms, the Berliner Philharmonie and a laboratory space resembling a small chamber music venue. The experiment assessed the binaural renderings with respect to overall listening experience and four perceptual attributes from the Spatial Audio Quality Inventory in comparison to a reference recorded with a head and torso simulator. The results provide detailed insights into which rendering method and array combination provides a high overall listening experience while preserving the assessed perceptual attributes externalization, coloration, source position, and presence. Moreover, the results indicate the extent to which the assessed perceptual attributes contribute to overall listening experience.

Download: PDF (4.34 MB)

Rectangular horns are essential components in professional audio systems where precise directivity control is important. Although a horn’s directivity is known to be influenced by its finite mounting enclosure, a shape-optimization method to improve directivity control under these realistic conditions has been lacking. To address this, an optimization method based on a hybrid model is developed. The model couples the mode-matching method for the internal sound field of the horn with the boundary element method to accurately compute the modal radiation impedance at the mouth. This provides the mode-matching method with a realistic boundary condition that fully accounts for the enclosure. The optimization method then employs this hybrid model within a gradient-based procedure to design a rectangular horn that maintains constant coverage angles in both horizontal and vertical planes over a wide frequency band. A physical prototype was manufactured, and its directivity measurements show good agreement with simulations, validating the proposed method as a predictive tool for high-performance horn design.

Methods for Pitch Analysis in Contemporary Popular Music: Multiple Pitches From Harmonic Tones in Vitalic’s Music

Authors: Deruty, Emmanuel; Meredith, David; Grachten, Maarten; Arbez-Nicolas, Pascal; Jørgensen, Andreas Hasselholt; Hansen, Oliver Søndermølle; Petersen, Christian Nørkær; Stensli, Magnus


OPEN ACCESS

This study shows that contemporary popular music with electronic elements can intentionally exploit multiple or ambiguous pitch perceptions arising from a single harmonic complex tone. The phenomenon is illustrated with examples from the work of electronic artist Vitalic and from widely used electronically mediated tones. Two listening tests were conducted: (1) evaluation of the number of simultaneous pitches perceived in sequences of quasi-harmonic tones, and (2) manual pitch transcription of selected sequences. Relationships between signal characteristics and pitch perception were subsequently analyzed. In the musical sequences under consideration, the synthesized harmonic tones were found to convey a greater number of perceived pitches than the acoustic tones. Multiple ambiguous pitches were associated with features such as prominent upper partials and specific autocorrelation profiles. In contemporary popular music with an electronic component, harmonic tones can convey multiple ambiguous pitches. The set of perceived pitches depends on both the listener and the listening conditions.

Download: PDF (13.03 MB)

This qualitative study examines immersive music-production workflows in professional commercial contexts, predominantly within Dolby Atmos environments. Using an interpretivist design, the study combined 30 semi-structured interviews with nonparticipant observations and analyzed transcripts and field notes through conventional content analysis with triangulation. Findings suggest that practice progresses through three stages: preproduction, production, and postproduction. In preproduction, teams align expectations through experiential listening, plain-language briefs, spatial storyboards, and early validation of stems and metadata. In production, placements of sound sources and room acoustics function as compositional parameters that shape localization and movement, inform recording choices that retain room character and multimicrophone flexibility, and expose limitations of monitoring and rendering tools. In postproduction, creative aims are balanced with the need for consistent translation across playback systems, supported by template-based routing, metadata policies, and explicit bed, object, and low-frequency–effects practices. Across these stages, practitioners face challenges, including knowledge gaps, renderer divergence, dynamics control, and economic pressures. Reported responses include stakeholder education, format-agnostic workflows, future-proofed session design, and early artist engagement. The study contributes a stage-based model of immersive production practice, a taxonomy of bed and object strategies, and guidance for achieving reliable translation between loudspeaker and headphone playback.

The characterization of nonlinear distortion in electronic devices, such as audio amplifiers, is traditionally based on the measurement of spectral components generated by the device and absent from the input signal. However, analysis based on nonlinear system models reveals the existence of additional distortion contributions that coincide with the fundamental frequencies and are therefore neglected by conventional metrics. This work develops an accurate and low-complexity measurement procedure to estimate these “hidden” components based on one of the most general block-oriented nonlinear models, namely the parallel Wiener–Hammerstein model. The procedure enables both the computation of more representative nonlinear distortion metrics, accounting for all distortion contributions, and the analysis of the phenomenon in the time domain. Finally, the study discusses the practical limitations of the method and presents its validation through both numerical simulations and experimental applications on real-world audio amplifiers.

Engineering reports


Methods for Combining Time-Frequency Representations: A Python Package

Authors: Boechat, Bernardo A.; da Costa, Maurício V. M.; Biscainho, Luiz W. P.

This paper presents the main ideas behind ctfr, an extensible, user-friendly Python package for efficiently combining time-frequency representations (TFRs) of audio signals into a single representation that captures the best aspects of each, achieving high resolutions in both time and frequency. The authors develop and evaluate algorithmic tweaks and approximation schemes for existing TFR combination methods, with significant performance improvements over baseline implementations. In addition, combined TFRs are employed in training a deep learning system for note transcription from audio performances, showing improved results over traditional TFRs, thus demonstrating the effectiveness of using combination methods in audio processing and music information retrieval pipelines.

Standards and Information Documents


AES Standards Committee News

Download: PDF (267.29 KB)

Departments


Conv&Conf

Download: PDF (1.63 MB)

Book Reviews

Download: PDF (166.57 KB)

Extras


Table of Contents

Download: PDF (50.03 KB)

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Outside Usage Request Form

Please fill out the form below if you would like to use an AES article in an outside publication.

This field is required.
This field is required.
This field is required.
This field is required.
Choose your country of residence from this list: